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Cancellation of Non-Scacionary Interfering Signals 
for Speech Recoonition 

This invention relates to apparatus and method for 
cancellation of non*8tationary interfering signals. In 
particular, the invention relates to cancellation of such 
signals for the purpose of recovering a wanced speech signal 
for use by a speech recognition application. The invenrlcn 
is especially suitable for use ir. an automobile where ir.-car 
devices produce interfering signals during the spssch 
recognition process. ✓ 

A problem associated with speech recognition is z'r.BZ cf 
maintaining performance ii^ the presence of interfering 
signals so that the speech recognition process continues to 
function satisfactorily even in the presence of background 
noise. Known systems have been directed cowards mitigating 
effects of quasi -stationary noise such as telephone channel 
noise or car noise. Proposed solutions to quasi -stationary 
noise interference include spectral subtraction, Weiner 
filtering and parallel model combination, each of which work 
in the spectral domain. 

There are, however, other sources of interference in 
acoustic environments which may degenerate performance of 
speech recognition a'pplications . In the example of an**' 
automobile environment, in addition to engine noise, another 
source of potentially interfering non- stationary acoustic 
signals includes sound generated by electronic devices 
operating in the car. Examples of such devices include in- 



car entertainment accessories such as radios, compact disc 
players and cape players and also other types of devices 
which may* emit sonic signals, e.g. telephone ringing or 
navigation system warning tones. In this specification,, 
electronic devices capable of emitting acoustic signals and 
operating in a vehicle are generically referred co as 
"Electronic in-car Acoustic Devices" (ECAD) . 

Sound generated by ECAD could be present when a user 
wishes to control a device using a voice contnand. Per 
example, a radio may be playing in a car when the user wants 
to use voice control of a navigation system or the radio 
itself. In this case, the original incerferzng signal 
produced by che radio is assumed to be known and accessible 
but has passed through an unJcnown acoustic path between the 
radio's loudspeakers and the speech recognition syscem's 
microphone. The acoustic path may be determined fay the 
position of the loudspeakers and the microphone inside the 
car as well as other facziors, such as the number of passen- 
gers and the presence of luggage inside zhe car. 

Known systems which attempt to overcome the problem of 
non- stationary interferers have been based on time domair. 
adaptive filters. However, although adaptive filtering may 
produce satisfactory results, this approach suffers from a 
number of disadvantages. Such disadvantages include ' hich^ 
computational requirements and slow convergence of adaptive 
filtering algorithms. Simple forms of adaptive fxltering 
may require order 3N computations per sample . * Such high 
computational requirements can mean that complex hardware 
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may be required in order co perform the necessary filtering, 
chereby increasing costs of devices incorporating such 
technology to the consumer « 

According to a first aspect of the present invention, 
there is provided apparatus, for cancellation of one or more 
non- stationary interfering signals for speech recognition, 
said apparatus con^rising: 

means for receiving an acoustic signal; 
means for generating an estimated value of a magnitude 
spectrum of said non- stationary interfering signals; ar.d 

means for subtracting said estimated value from said 
received acoustic signal to produce a representation cf a 
wanted speech magnitude spectrum. 

Preferably, said means- for generating estimated value 
includes processing means configured to estimate a transfer 
function for an acoustic channel between each source of said 
non-stationary interfering signals and said means for 
receiving an acoustic signal. 

Preferably, said processing means is configured to 
estimate transfer functions for non- stationary interfering 
signals produced by left and right stereo channel trans- 
missions . 

Preferably, said estimation of said transfer functions 
is achieved by said processing means executing an iterative** 
algorithm on a frame- by- frame basis, the frames being 
constituted by successive time periods. 

Preferably, said processing means is configured to 
estimate magnitudes of said left and right channel interfer- 
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ence signals, 

said magnitude of left channel interference signal 
estimated "by subtracting said right channel interference 
signal magnitude estimated during previous said xueracion 
5. from said acoustic signal received at current: said icer- 
acion; and 

said magnitude of rich?: channel incerf erence signal is 
estimated by subtracting said left channel interference 
signal magnitude estimaced during previous said iteration 

10 . from said acoiistic signal received at current said iter- 

. " ation. 

Preferably, said transfer function estimate for said 
right stereo acoustic channel is determined by dividing said 
right channel interference magnitude estimate by said 
15 interfering signal transmitted from said righc acouscic 
stereo channel; and 

said transfer function estimate for said left stereo 
acoustic channel is determined by dividing said left channel 
interference magnitude escimate by said interfering signal 
20 transmitted from said left acoustic stereo charmel. 

Preferably* said right acoustic channel transfer 
function estimation is performed for a said iteration only 
if a ratio of total energy of said right acoustic scefeo 
channel interfering Signal over tocal energy of said left**' 
.25 acoustic stereo interfering .channel exceeds a predetermined 
chreshold value; and 

said left acoustic channel transfer function estimacicn 
is performed for a said iteration only if a ratio of total 
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energy of said left acoustic stereo channel interfering 
signal over total energy of said right acoustic stereo 
channel interfering signal exceeds a predetermined threshold 
value. 

5 Preferably, said ratio and threshold comparisons ai-e 

applied to individual frequency conponencs in spec::r3 of 
said signals. 

Preferably, said lef s and right stereo acoustic channel 
transfer functions are multiplied by , (1- |ii (k) | ) where r\{k) 
10 is coherence of said left and right interfering signals at 
a frequency index k . 

Preferably, said cransfer function estimate for saxd 
right stereo acoustic channel is obtained using an express- 
ion: 

IB and said cransfer fijncciocs estimate for said lefc scereo 
acoustic channel is obtained using an expression: 



wherein R" (k)=Hcj^(k) .C<k) , with C(k) being a common component 
of said left and rigHt stereo channel signals and H^-p/h) is^^* 
a cransfer function between common said left and ri-ght 
20 stereo channel transmissions, and said right stereo channel 
and L" {k)«L{k) -Ha,<k) -C(k) , where HcL(k) is a cransfer 
function between common said lefc and right stereo channel 
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transmissions and said lefc stereo channel signal. 

Preferably, wherein said processing means further 
comprises * means for smooching said escimatec transfer 
functions in time domain. 

Preferably, wherein said means for smoothing in cime 
domain comprises a first order recursive filcer* 

Preferably, said processing means further comprises 
means for smooching said estimated cransfer functions in 
frequency domain.- 

Preferably, said means for smoothing in frequency 
domain comprises a Finite Impulse Response filzer. 

Preferably, said processing means includes means for 
performing a Fourier Transform. 

Preferably, said non-scationary interfering signals are 
produced by an electronic acouscic device operacing :.n a 
vehicle. 

Preferably, said means for receiving an acouscic signal 
comprises a microphone. 

According to a second aspect of che present invention 
chere is provided a method of cancellation of one or more 
non-scationary interfering signals for speech recognition, 
said method comprising steps of : 

receiving an acoustic signal; 

generating an e^imated value for a magnit:ude spec-rum^: 
of said non-scacionary interfering signal; and 

siibtraccing said escimaced value from said received 
acouscic signal co produce a representation of a wanted 
speech magnitude spectrum. 
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within an automobile- For che purposes of che description, 
it is generally assumed that a phase of the interferer 
signal is not required at che speech recognition syscem. as 
recognition feature sees such as cepstra do not' normally 
contain phase information. 

The invention may be performed in various ways anc, by 
way of example only, a specific embodimenc thereof will new 
be described « reference being made to che accompanying 
drawings, in which: 

Figure 1 illustrates 'schemacically an example of an 
automobile environmenc having an ECAD where a speech 
recognicion system is used to control an in-car device; 

Figure 2 illuszrates a flow diagram representing s::eps 
which may be used to estimate transfer functions represent- 
ing a model of an in-car acoustic channel; 

Fig;ire 3 illustrates schematically components which may 
be used to implement a refinement of the algorichn in Figure 
2; 

Figure 4 illustrates a block diagram represencxng a 
specific embodiment of the ^>resent invention; and 

Figures S to 8 illustrate examples of microphone 
signals obtained during experimental use of che preser.c 
invention. 

Figure 1 illuscrates schematically a simple situation** 
in which stereo ECAD signals are transmicted from separate 
loudspeakers. Left scereo signal L(j«) is cransmicced from 
left loudspeaker 101 and right stereo signal R(j<a; is 
cransmitted from right stereo speaker 102. 



Equation <3) 

The following conclusions may be drawn from ecua- 
cion(3} : 

• In Che case of a mono transmission being output chrcuah 
loudspeakers 101 and 102 whilst the user is saying a voice 
command, signals L(jo) and R(ju) are completely correlated 
with each ocher whilst being completed uncorrelacec with 
S(jw). In this case, individual left and right char-nel 
transfer functions cannot be uniquely determined, buc a 
composite estimate which contains terms due to both left and 
right channels can be obtained. This as sufficient for 
P-sctical cancellation of the mono ECAD signal output 
through the two loudspeakers received at the microphone. 

• If L(j«) and R(j4>) and S(j«) are ail uncorrelaced, a 
correct estimate of the channel response will be obtained 
because second and third terms in equation (3) will normally 
have long term averages of o. 

• If L(j«) and R(j©) are parcially correlated, lefc and 
right acoustic channels cannot be unambiguously estimated* 
However, if L(ju) and ?4(j6)) occupy different spectr&i 
regions or if corresponding time domain signals Kzj and 
r(t) have periods where one has low energy whilst the otherV" 
has high energy, ^it may be still possible to make useful 
estimates of left and right channels for purposes of 
cancellation. 

The frequency domain estimation of the right acoustic 
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Channel response given by. equation (3). and a corresponding 
equation for the left acoustic "channel transfer function, 
HAx.(j«), may be used to obtain an estimate of the magnitude 
of the wanted speech speccrum S(j«>. An estimate of =he 
5 wanted speech magnitude speccrur. may be obcained by sub- 
tracting the estimates of che lefc and right acouscic 
channels of the ECAD signals from che acouscic signal Yow 
received at the microphone: 

SquacioR (4! 

' An escimace of the , acouscic channel power transfer 

function for the right acoustic channel, derived by squaring 
equation (3) may be as follows: 

Equation (S) 

A corresponding estimate of the acoustic channel power 
transfer functidh for the left acoustic channel can also be 
derived by chose skilled in the arc. 

^sing an iterative approach, coupled with time and 
frequency dimension smoothing of the estimates of che / 
channel response may be uaied to overcome problems caused by 
left and right signal correlation described herein above .'^ 
Another problem which may need to be addressed arises 
because phase information in the channel response may be 
ignored, as the phase of the interferer is not normally 
required at the speech recognition system. As noted above. 
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cancellation for the purpose of speech recognition only 
. requires an estimate of the magnitude of the speech spectrum 
because Mel Frequency Cepstral Co-efficient (mfcc) feature 
• vector used by the speech recognition system in the pre- 
5 f erred embodimenc is based on magnitude spectra. TheM?CC 
be obcained by subjecting the speech ' spectrum in the 
frequency domain to a fast fourier transform in order to 
obtain its power in various frequency' slots . The value of 
the power in the frequency domain is then passed throush a 
q. log function and then a cosine transform co obtain the 
cepstrum in which the elements are orthogonal . 

Normally, the phase characteristic encodes a frequency 
dependent delay spread associated with the acoustic transfer 
function. In a car typically the minimum delay is about 
3ms. The, delay spread maybe compensated when making the 
channel estimate using equation (.5) . However, this compen- 
sation may be unnecessary if che speccral evaluation is done 
using a fast fourier transformer with block length much 
greater than the channel delay. 

A practical form of the "cancellation of non-stationary' 
interferer signals such as those produced by ECAD may 
therefore be achieved using an algorithm 200 as illustrated 
by steps in Figure 2 of the accon^anying drawings . m the 
preferred embodiment.- the steps 201 to 205 are repeated onceV- 
for each single frame (i.e a signal received at the micro- 
phone in a fixed period of .time), however, initialisation- 
steps 201 and 202 may only be performed for a first frame. 
At step 201, estimaces of magnitudes of che left and right 



estimates of left and right cnanneis ror purpo-^-a- 
cancellation. • ■ 



15 

cancel lac ion for the purpose of speech recognition only 
requires an escimace of che magnitude of the speech speccrum 
because Mel Frequency Ccpstral Co-efficient (MFCC) feature 
vector used by the speech recognition system in the pre- 
ferred embodiment is based on magnitude spectra. The MFCC 
may be obtained by subjecting the speech spectrum in rhe 
frequency domain to a fast fourier transform in order to 
obtain its power in various frequency slots. The value of 
the power in the frequency domain is then passec through a 
log function and chen a cosine transform co obtain the 
cepstrum in which the eiemencs are orthogonal . 

Normally, the phase characteristic encodes a frequency 
dependent delay spread associaced with the acoustic transfer 
function. In a car typically the minimum delay is about 
3ms. The delay spread may be compensated when making the 
channel estimate using ec[uation (5) . However^ this compen- 
sation may be unnecessary if the spectral evaluation is done 
using a fast fourier wransformcr with block length much 
greater than the channel delay. 

A practical form of the cancellation of non-stationary 
interferer signals such as those produced by ECAD may 
therefore be achieved using an algorithm 200 as illustrated 
by steps in Figure 2 of . the accompanying drawings. In the 
preferred embodiment- che seeps 201 to 205 are repeated onc^V; 
for each single frame (i.e a signal received at che micro- 
phone in a fixed period of time) , however, initialisation 
steps 201 and 202 may only 'be performed for a first frame. 
Ac step 201, estimates of magnitudes of che left and right 
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channel transfer fiinctions, H;^(Jtt) and , H^(Ju) are 
initialised (set co zero): 

H';^ (6)) 'H^^ <ia) 

At Step 202, estimates of magnitude of left and righc 
channel interference, and Cr, are initialised: 

At step 203, new estimates of magnitudes of the iefz 
and right interference signals at the microphone are 
calculated.. This is achieved for zhe left microphone signal 
by subtracting the channel estimace of the magnii:ude of che 
right channel (calculated cjuring the algorithm iteration for 
the immediately previous frame) from the mxcrophone signal 
received at the current iteration An) . For che right 
interference channel, the magnitude estimace for the left 
channel derived during the previous iteration Cn-i? is 
subtracted from the microphone signal : 

( Squat icn 6) 
(Equation 7) 

At step 204, rough estimates of the left and right 
transfer fimcticns, K;^(j«) and H;^(jto), are made. This is 
achieved for the left channel transfer function by dividing 
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the escimaced left incerference signal calculated «c sceo 
203 by the signal transmitted from the left stereo acoustic 
channel, ^^or the right transfer function, the right channel 
interference signal estimate ' calculated at step 203 is 
divided by the signal transmitted from the right acoustic 
scereo channel : 



(Equation 8) 



(Equation 5) 



Substituting equations (6) and (7) into the terms for 
the estimated interference signals in equations (8) and ir. . 
respectively, gives expressions used to provide rough 
estimates of the left and right channel transfer func- 
tions i 
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At step 205 the rough estimates of the channel transfer 
-. functions obtained at. .tep 204 may be smooched, preferably 
both in the cime and frequency domains. Time smooching is 
preferably achieved with a first order recursive filter 
5 using a time conacanc of several hundred milliseconds. For 
example, cime smoothing for the righc channel may be as 
follows (a similar equation may also be obcained^ : 

• Frequency smooching is preferably achieved using a 
Finite impulse Response filter (represented by f{«) in an 
eguacion herein below) wich a triangular impulse response 
covering about 300 Hertz. Frequency smoochi.ng for che r.ghc 
channel may be as follows (a similar expression for che lefc 
channel may also be obtained): ' 

The cancellation algorithm 200 described in steps 201 
CO 205 herein above may be refined by means of the four ways 
described herein below in order to attempt co deal with 
problems highlighced by equation (3) concerning correlaciorr 
of lefc and right channel signals; 

1. Updaci.ng of che recursive filcer providing the smocth-d " 
channel estimate can be inhibiced unless energy of one'*: 
channel greatly exceeds energy of che ocher channel. This 
is preferably achieved by updacing the left or righc channel 
response only when it is assumed chat only left or rxghc 
channel, respectively, is active'. Thus, a new righc 
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acoustic channel transfer funccion would be estimated at 
step 204 if a ratio of the total energy of the signal 
transmitted from the right acoustic stereo channel by the 
total energy of the signal ' transmitted from the left stereo 
acoustic channel exceeds a predetermined threshold value, 
otherwise the estimate calculated for the -ransfer function 
during the previous frame iteration is used, a correspon- 
ding estimation would also be performed for the left 
transfer function. 

Using Ej^ to represent the total energy in the n^v. frame 
of the left stereo acoustic channel and 5^ represent the 
total energy in the n^jj frame of the right stereo acoustic 
channel. Thus, the channel response estimation algorithm 
for the right channel is: 



otherwise use previous estimate (ftAR,n-i)^- ^p/^l^ thr- 
eshold- 

The channel response estimation algorithm for the. left 
channel is: 



Otherwise use previous estimate <fiAL,n-i^if ^s^^l 
Threshold. 
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Normally, when considering the right channel, when the 
threshold is exceeded, Y(j») should consist mainly of terms 
due to the right channel and the- wanted speech signal. 
3r<j«) should contain very little energy due no the lefc 
channel if the threshold is set at high value. The reverse 
normally holds when considering the left channel. Time and 
domain smooching subscancially as described az seep 2C5 
would also be used. 

2. Updating of recursively smoothed channel estimate at 
particular frequencies can be inhibited unless energy/ s.z 
chat frequency in one channel greatly exceeds che energy az 
th&z frequency in the other channel. This may be achieved 
by estimacing new values for che left and/cr right acouscic 
channel transfer functions when a ratio of the cocal 
energies of the lefc and right stereo acoustic signals 
exceeds a given threshold at individual frequency components 

in the spectrum. Preferably, ' the threshold may apply zo 

frequencies comprising a harmonic number in the Dlscrees 

Fourier Transforms cf the signals. 

Using a similar cerminology to that in 1. herein above. 

the channel response estimation algorithm for che righc 

channel is: 



Otherwise use, estimate ac previous iteration (H,« ,j 
if E(k)R/E(k)j^< Threshold. 
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The Chanel response e«i™.tlcn ^go^i.^n, £or che 
Channel is: 



otherwise use the est:i„«.e calculated a. che orev.ous 
Iteration (fi^ if E(k)^/E.fk)« < Threshold. 

in this definition, the index >c refers CO Che har^nic 
_ nu.^er In the DFTs of the signals. For exa^le, E(ic,, is c>^e 
energy of che k=h har««nic in the DBT of =he rieht st->.eo 
. .ource signal. rhis algorithm should ensure that ck. 
ecouscic cha^^el responses . are only updated at those 
^encies and at those time ac which the signal at .he 
microphone consists .ainly of either left or right channel. 
3. Evaluate coherence function between the left and right 
Channel signals and use inverse „.g„i..ee of che coherence 
at each frequency as a weighting on- the amount by wh^'ch 
estimates of che channel responses are updated at that 
frequency. The coherence function provides a .easu^- o^ 
correlation pver a period of time of phases of two dif f^^-enc 
sxgnals measured at a particular frequency. The cohe>-ence 
function may be used in various ways, normally based o- t-. 
idea that the update of the acoustic cha.nnel responsible 
Will he decreased if che lefc and right stereo channels a^- 
Phase-correlated at a particular frequency. if che coher- 
ence approaches unity, the signals are correlated, but only 
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at the specified frequency. n.us. the channel response 
estimates for che right channel be derived fron. the 

following algorithm (a corresponding method for che transfer 
function for the left channel may also be derived) : 



S where n (k) is the coherence of the left and right stereo 
source signal at frequency index k. 



n (k) ~ ^<^>'R'(k) \ 
^^""^ IL(k)j.lR(k)l) 



where the expeczation is over cime. 
4. Extract those components of che lefc and righc .SCAD 
source signals whxch are uncorrelaced (orchogonal) and use 
chetn CO make escimaces of che Wtt and righc channel 
responses, in this approach, a common component c(fc) zn che 
left and right ECAD sources is removed by adaptive filtering 
CO yield an orthogbnal pair of signals, L' • (k) and R- ' (k) : 
R(k) =R"(k)-.-HcBfk) .C(k> 
L(k)-L" (k)-rHcL(K) .c(k> 

wherein Hci.{k) is the transfer function becween" che 
common (left and right stereo signals combined, which may beV 
fixed in a recording studio) ECAD signal source and the left 
ECAD signal source and H^Jk) is the transfer function 
between the common ECAD source, signal and che right egad 
source . 
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The orthogonal ised signals are used to make zhe 
acoustic channel response estimates. For the right stereo 
channel transfer function the following expression may be 
used (a corresponding ..express ion for the left stereo channel 
transfer function may also be obtained) : 

Most of the terms are long term uncorrelated so we cec: 
Che true acoustic channel response. 

Thus, the right stereo acoustic channel funczicn, 
-^;^(Jc), may be obtained by dividing che signal received &z 
the microphone by R* ' (k) . 

Figure 3 of the accompanying drawings illuscraces 
schematically an exanple of components which may be used zo 
form L* ' (jo) and R' ' {ju> . The components include two 
adaptive filters, 303 and 304, either implemented in cne 
frequency domain, or preferably, the, time domain. The 
coefficients of each FIR adaptive f ilter-'are adjusted usinoF^ 
LMS or similar, to minimise the cecal energy in r* ' <rti and 
l''(n), respeccively, i,e\ operace filters in sr^ar.card 
system identification mode as in echo cancelling etc. 

The right stereo ECAD signal r(n) 301 is fed into 
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adaptive filter 303 and a conOjlner 305. ' The left stereo 
, ECAD Signal 1 (n) 302 is fed inco adaptive fiicer 304 and . 
cotnbiner 3V)6. The output of adaptive fiicer 303 is also fed 
into condjiner 306. The output of adaptive filter 304 is 
5- also fed into combiner 305, The output of cotnbiner 305 may 
be fed back via an adaption control path into adaptive 
filter 304. The output of mixer 306 may be fed back ir.rc 
adaptive filter 303 via an adaption control path. The 
output of combiner 305 comprises the orthogonal right srereo 
signal r' ' (n) 307. The output of combiner 306 comprises the 
left stereo orthogonal signal 1 • ■ (n) 308. 

Figure 4 of the accompanying drawings illustrates a 
block diagram representing a specific embodiment of the 
present invencion. Processing components of Fig. 4 may be 
electronic processors fitte<? integrally to the in-car device 
where the speech recognition system is located or, altema- 
clvely, may be a stand alone electronic device intended tc 
receive acoustic signals, cancel non-stationary interfering 
Signals and output a filtered acoustic signal to be received 
by Che speech recognition system's microphone. 

ECAD sound source 401 (such as the signals oucpur 
loudspeakers 101 and 102 of Figure 1) may be received 
directly by a spectral a.nalysis process 404 so that the 
signal as produced by* the SCAD prior to transmission thrcughV 
the in-car acoustic channel 403 may be analysed. The 2CAD 
signal is also received by a spectral analysis process 405 
after transmission through acoustic channel 403 so thac the 
signal 401 is in effect simultaneously spectrally analys-Erd 
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before and after cransmission through the acoustic channel 
403. The spectral analysis of psfocesses 404 and 405 is 
preferably carried out at a 16 ms frame rate using a 256 
point Fast Fourier Transformer. If user speech 402 "(corre- 
sponding CO wanced speech signal S(j«) 104 of Figure i) is 
also present then this acoustic signal too will also be 
transmitted through the acoustic channel 403 and received bv 
spectral analysis process 405. 

The output of spectral analysis processes 404 and 405 
are used as inputs to acoustic channel model estimation 
process 406 which preferably functions in accordance with 
algorithm 200 described herein above. Acoustic channel 
model estimation process 406 produces an acoustic channel 
model 407 which may be used as en input co a spectral 
subtraction process 408 which also receives the acoustic 
signal transferred through charsnel 403. 

When the speech recognition system is required, che 
acoustic channel model 407 is frozen for duration of che 
speech recognition, process. The acoustic channel model 407 
is then used to recover the speech signal from the micro- 
phone signal by subtracting the estimated spectrum of the 
ECAD interfering signals contained in the model 407 from the 
acoustic signals received at the microphone. The spectrally 
subtracted signal representing the recovered wanted speech 
409 is then passed to a pattern matcher process 410 (pare oC 
che speech recognition system) which may use - recognicxon 
feature sets such as Hidden , Markov of models 311 in order Co 
match che recovered speech signal 409 co a command which is 
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recognised by ch« system. The pattern matcher 409 may then 
pass on an output signal to trace back and decision process 
412 in order that the user's speech command be carried out 
by the device. 

Since the spectral subtraction algorithm is frame 
rather than sanyle based, its computational complexicy is 
low. The algorithm's xain confutation is required for the 
Fast Fourier Transform, which requires order NlogN computa- 
tions per frame for each channel. This is typically only 
about 2 3 03c computations per second, which is sicnif icar.cly 
lower than the order 3^5 computations per sample required fay 
the simplest form of known adaptive filter technique. For - 
an echo tail length of 32 microseconds. 256 samples, this 
equates to more than 13 million operations per second. 

Figures 5 to a of the accompanying diagrams illustrate 
microphone signal traces before a.-,d after the non- stationary 
interferer signal cancellation for different types of music 
outpuc by the ECAD at different sig.nal to interference 
ratios. In order to allow for comparison between an 
uncancelled signal passed through the acoustic cha.nnel ar.d 
the cancelled signal, test data was constructed by recording 
speech and interferer signals separately in the same car 
environment and then adding the two signals. " m the 
examples shown in figures S to 8. the interfering musxc isV 
25 a stereo signal. 

Figures SA co so of the accorapanyi.ng drawings iilcs- 
, trace microphone traces with and without cancellation -.r. a 
case where -the ECAD outputs bop music at OdB signal Co 
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interference ratio. i„ Pig. SA a signal received at the 
microphone prior to cancellation is illustrated. i„ ,his 
case, peak segmental speech and interferer levels are the 
same. This is a highly pessioustic way of estimacing 
5 sxgnal-to-noise ratio as amplitude "variability of speech 
Signal is higher than that of che ECAD music signal outout 
which exceeds che speech for a considerable part of che 
exan^le. Fig. 53 illustrates a signal resulting fro.-, a.^. 
inverse transformation o^ the signal of Fig. SA af=«r 
spectral subtraction. The interfering . signal as shown 
Fig. SB has clearly been reduced. Fig. SC illuscrares a 
Signal representing aornalised squared cepstral disca.-.ces 
for application of che cancellation algorichm. Fio. 53 
illustrates a signal crace for the normalised squared 
cepscral discances of Fig. SC after spectral subtraction, 
comparing che traces illustrated in Fig. sc a.nd SD. ic can 
be seen that the recovered speech cepstral are less dxs- 
torted than with che interferer. 

Figures 6A to SD of che accompanying drawings illus- 
trate microphone traces with and without cancellation .n a 
case where the SCAD outputs pop music at 10 decibel s.gnal 
to interference ratio, in Fig. 6A a signal received at the 
microphone prior to cancellation is illustrated.- Fig. 63. 
iUustraces a signal- resulting from an inverse tranlfcrm-V- 
ation on the signal of 6A after spectral subtraction. The • 
interfering signal shown in Pig. sB has clearly b^^n 
reduced. Fig. 6C illustrates a signal repres-txng 
normalised sqruared cepstral distances for application of che 
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-cen«i„„ ^ ^^^^^^^^^ 

r:„::r - » — ^ 

reduced. pia 7r ,ii ^ 

'j-g. 7c Illustrates a 
n™=i- . signax represencir^ 

nor^auseo squared cepstral d.s.ances .o. appU^tion of ..l , 

caacelUtion algorithm. Pig. 7d H lusc^a-es . • ' 
For uscra^es a signal crace 

-or Che normalised scmare ceost-ai A.- 

- cepsu^al distances of fio. 7C 

after speccral subtraction. 

r.s-ares SA .o of Che adcon*.„yi„9 dr.wl,^, 
.race .,c.opH». ca„c,n...o;\; a 

cas. „h..e Che outpu.. opera.«3lc a. ,„ 3,,.^. 
- .«e..a.e„c. . ^^^^^ ^^^^^^^^ " ' " 

acxon on the signal of 3A afce- SDectr;i « u 

. ^ , spectral subtraction. The 

interfering signal ^hown i„ pig qb i , 

reduced. Kig gc sii ^ 

. illustrates at signal representing 

normalised scared cepstral distances for appUcacion cf th- ' 
cancellation algorich.. Pig. 3, ixxustrates a signal .racJ 
for the normalised square cepstral distances of p.. 30 
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